AITopics | hierarchical softmax

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Neural Information Processing SystemsMar-16-2026, 23:02:20 GMT

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic---a reduction technique from multi-label to multi-class that is routinely used along with HSM---is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Country: North America > United States (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

e43739bba7cdb577e9e3e4e42447f5a5-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 18:36:03 GMT

final version, gradient, theorem 1, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Neural Information Processing SystemsNov-20-2025, 22:36:12 GMT

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic---a reduction technique from multi-label to multi-class that is routinely used along with HSM---is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.

artificial intelligence, hierarchical softmax, machine learning, (9 more...)

Neural Information Processing Systems

Country: North America > United States (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-7-2025, 11:02:58 GMT

The family is not described until near the end of the paper, nor is it adequately explained why the proposed method works for precisely that family of loss functions.

author feedback and meta-review, hierarchical softmax, loss function, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Reviews: A no-regret generalization of hierarchical softmax to extreme multi-label classification

Neural Information Processing SystemsOct-7-2024, 17:40:38 GMT

Summary: This work investigates Probabilistic Label Trees (PLTs) in solving extreme multi-label classification (XMLC). The theoretical analysis shows PLT is a no-regret algorithm for precision@k, and the algorithmic improvement combines PLT and fastText to efficiently handle extreme multi-label text classification problems, with a clustering-based tree structure building strategy. This paper is comphrensive and well-written, including extensive experiments. The theory part formally shows PLT outputing k labels with highest marginal probabilities is consistent with precision@k, given zero-regret node classifiers. The authors also provide some negative result on heuristic strategies, one is that pick-one-label heuristic is suboptimal in terms of precision@k, and another is that building Huffman trees for PLT does not minimize computational cost.

extreme multi-label classification, hierarchical softmax, no-regret generalization, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)

Add feedback

Distributed Representations of Words and Phrases and their Compositionality

Neural Information Processing SystemsMar-13-2024, 19:06:36 GMT

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

representation, skip-gram model, vector, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Redmond (0.14)
North America > Canada > Ontario > Toronto (0.05)
North America > Canada > Quebec > Montreal (0.05)
(19 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.70)
Leisure & Entertainment > Sports > Hockey (0.69)
Leisure & Entertainment > Sports > Basketball (0.68)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Global Hierarchical Neural Networks using Hierarchical Softmax

Schuurmans, Jetze, Frasincar, Flavius

arXiv.org Artificial IntelligenceAug-2-2023

This paper presents a framework in which hierarchical softmax is The paper is structured as follows. In Section 2 previous works used to create a global hierarchical classifier. The approach is applicable on hierarchical classifiers and hierarchical softmax is covered. Our for any classification task where there is a natural hierarchy proposal for the hierarchical softmax is presented in Section 3. Then among classes. We show empirical results on four text classification in Section 4 we describe several datasets and Section 5 discusses the datasets. In all datasets the hierarchical softmax improved on experimental setup. In Section 6 we compare the results of models the regular softmax used in a flat classifier in terms of macro-F1 with a regular softmax and with a hierarchical softmax on these and macro-recall.

artificial intelligence, machine learning, softmax, (16 more...)

arXiv.org Artificial Intelligence

2308.0121

Country:

Europe > Netherlands > South Holland > Rotterdam (0.05)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learn NLP the Stanford Way -- Lesson 2

#artificialintelligenceDec-9-2020, 03:15:15 GMT

In the previous post, we introduced NLP. To find out word meanings with the Python programming language, we used the NLTK package and worked our way into word embeddings using the gensim package and Word2vec. Since we only touched the Word2Vec technique from a 10,000-feet overview, we are now going to dive deeper into the training method to create a Word2vec model. The Word2vec (Mikolov et al. 2013)[1][2] is not a singular technique or algorithm. It's actually a family of neural network architectures and optimization techniques that can produce good results learning embeddings for large datasets.

cbow model, glove model, neural network, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Wydmuch, Marek, Jasinska, Kalina, Kuznetsov, Mikhail, Busa-Fekete, Róbert, Dembczynski, Krzysztof

Neural Information Processing SystemsFeb-14-2020, 18:26:43 GMT

Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic---a reduction technique from multi-label to multi-class that is routinely used along with HSM---is not consistent in general.

extreme multi-label classification, hierarchical softmax, multi-label classification, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Add feedback

Reimagining Plutarch with NLP: Part 2

#artificialintelligenceSep-17-2019, 15:08:47 GMT

Plutarch's Lives of the Noble Greeks and Romans, also called Parallel Lives or just Plutarch's Lives, is a series of biographies of famous Ancient Greeks and Romans, from Theseus and Lycurgus to Marcus Antonius. In this article / tutorial -- following the recently published Part 1-- I will continue exploring this book using some of the natural language processing techniques. To help with an easy replication, I adapted the code to Google Colab, and highlighted what is unique to the platform -- otherwise the entire code can be run locally on Python 3.6 . The code is sequentially presented throughout the article and the link to the Github files is embedded at the end as I may skip some minor details or supplementary code. The text used in this analysis has been made available by Project Gutenberg.

artificial intelligence, machine learning, natural language, (19 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Add feedback

Filters

Collaborating Authors

hierarchical softmax

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A no-regret generalization of hierarchical softmax to extreme multi-label classification

e43739bba7cdb577e9e3e4e42447f5a5-AuthorFeedback.pdf

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Reviews: A no-regret generalization of hierarchical softmax to extreme multi-label classification

Distributed Representations of Words and Phrases and their Compositionality

Global Hierarchical Neural Networks using Hierarchical Softmax

Learn NLP the Stanford Way -- Lesson 2

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Reimagining Plutarch with NLP: Part 2